Search CORE

42 research outputs found

Recommended from our members

Provenance as First Class Cloud Data

Author: Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/11/2011
Field of study

Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Making a Cloud Provenance-Aware

Author: Macko Peter
Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Publication venue: USENIX Association
Publication date: 06/10/2011
Field of study

The advent of cloud computing provides a cheap and convenient mechanism for scientists to share data. The utility of such data is obviously enhanced when the provenance of the data is also available. The cloud, while convenient for storing data, is not designed for storing and querying provenance. In this paper, we present desirable properties for distributed provenance storage systems and present design alternatives for storing data and provenance on Amazon’s popular Web Services platform (AWS). We evaluate the properties satisfied by each approach and analyze the cost of storing and querying provenance in each approach.Engineering and Applied Science

Harvard University - DASH

Provenance-Aware Sensor Data Storage

Author: Ledlie Jonathan
Braun Uri
Ng Chaki
Holland David A.
Muniswamy-Reddy Kiran-Kumar
Seltzer Margo
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science

arXiv.org e-Print Archive

Crossref

Harvard University - DASH

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Choosing a Data Model and Query Language for Provenance

Author: Braun Uri Jacob
Holland David A
Maclean Diana
Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/05/2012
Field of study

The ancestry relationships found in provenance form a directed graph. Many provenance queries require traversal of this graph. The data and query models for provenance should directly and naturally address this graph-centric nature of provenance. To that end, we set out the requirements for a provenance data and query model and discuss why the common solutions (relational, XML, RDF) fall short. A semistructured data model is more suited for handling provenance. We propose a query model based on the Lorel query language, and brieﬂy describe how our query language PQL extends Lorel.Engineering and Applied Science

Harvard University - DASH

Provenance-Aware Sensor Data Storage

Author: Braun Uri
Holland David A.
Ledlie Jonathan
Muniswamy-Reddy Kiran-Kumar
Ng Chaki
Seltzer Margo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

CiteSeerX

Harvard University - DASH

Recommended from our members

Layering in Provenance Systems

Author: Braun Uri Jacob
Holland David A
Macko Peter
Maclean Diana
Margo Daniel Wyatt
Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Smogor Robin
Publication venue: USENIX Association
Publication date: 06/10/2011
Field of study

Digital provenance describes the ancestry or history of a digital object. Most existing provenance systems, however, operate at only one level of abstraction: the sys- tem call layer, a workflow specification, or the high-level constructs of a particular application. The provenance collectable in each of these layers is different, and all of it can be important. Single-layer systems fail to account for the different levels of abstraction at which users need to reason about their data and processes. These systems cannot integrate data provenance across layers and cannot answer questions that require an integrated view of the provenance. We have designed a provenance collection structure facilitating the integration of provenance across multiple levels of abstraction, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper. We layer these components atop provenance-aware network storage (NFS) that builds upon a Provenance-Aware Storage System (PASS). We discuss the challenges of building systems that integrate provenance across multiple layers of abstraction, present how we augmented systems in each layer to integrate provenance, and present use cases that demonstrate how provenance spanning multiple layers provides functionality not available in existing systems. Our evaluation shows that the overheads imposed by layering provenance systems are reasonable.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Layering in Provenance-Aware Storage Systems

Author: Barillari Joseph
Braun Uri
Holland David A
Holland Stephen D.
Maclean Diana
Muniswamy-Reddy Kiran-Kumar
Seltzer Margo I.
Publication venue
Publication date: 20/11/2015
Field of study

Digital provenance describes the ancestry or history of a digital document. Provenance provides answers to questions such as: “How does the ancestry of these objects differ?” “Are there source code files tainted by proprietary software?” “How was this object created?” Prior systems used to collect and maintain provenance operate within a single layer of abstraction: the system call boundary, a workflow specification language, or in a domain-specific application level. The provenance collected at each of these layers of abstraction is different, and all of it is important at one time or another. All of these solutions fundamentally fail to account for the different layers of abstraction at which users need to reason about their data and processes. None of these systems support queries across different layers of abstraction to answer a question such as “The calculated values in my spreadsheet have changed. Is this due to a change in the spreadsheet, a difference in the spreadsheet application, the libraries being used, or the operating system being used?” We present an architecture for provenance collection that facilitates the integration of provenance across multiple layers of abstraction and across network boundaries. We show how the need to support provenance collection at multiple layers drives the architecture. We present provenance-aware use cases from the field of thermography and quantify system overheads, showing that we can provide new functionality with acceptable overhead.Engineering and Applied Science

Harvard University - DASH

Practical whole-system provenance capture

Author: Akoush Sherif
Amir-Mohammadian Sepehr
Balakrishnan Nikilesh
Bates Adam
Bates Adam
Bauer Mick
Berger Stefan
Chan Sheung Chi
Davidson Susan B
Gonzalez Joseph E
Greenwood Mark
Gulzar Muhammad Ali
Han Xueyuan
Hoffman Steve
Katcher Jeffrey
Kyrola Aapo
Lee Brian
Lerner Barbara
Macko Peter
Morris James
Morris Thomas
Moyer Thomas
Muniswamy-Reddy Kiran-Kumar
Muniswamy-Reddy Kiran-Kumar
Muniswamy-Reddy Kiran-Kumar
Murta Leonardo
Pasquier Thomas
Povey Dean
Sailer Reiner
Schaufler Casey
Somayaji Anil
Xie Yulai
Zanussi Tom
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/09/2017
Field of study

Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system’s behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.Engineering and Applied Science

arXiv.org e-Print Archive

Crossref

Harvard University - DASH

Explore Bristol Research

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Author: Balakrishnan Nikilesh
Bates Adam
Gregg Brendan
Guo Philip J
Han Xueyuan
Han Xueyuan
Hassan Wajih~Ul
Jiang Xuxian
Jiang Xuxian
Kennedy David
Muniswamy-Reddy Kiran-Kumar
National Academies of Sciences Engineering, and
Pohly J
Spillane P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publicly-available audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments.Comment: 6 pages, 1 figure, 7 listings, 1 table, worksho

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Recommended from our members

Deciding How to Store Provenance

Author: Muniswamy-Reddy Kiran-Kumar
Publication venue
Publication date: 12/11/2015
Field of study

Provenance of a file is metadata pertaining to the history of the file. Provenance, unlike normal metadata stored in file systems, is retrieved primarily by running queries. This implies that provenance has to be indexed and should have a query interface. We believe that databases are the most appropriate place to store provenance as they provide both indexing and query capabilities. The goal of this paper is to explore the most appropriate schema and database technology for storing provenance. In the paper we discuss the different possible schemas for storing provenance and the tradeoffs in choosing each of the schemas. We then characterize the behavior of some of the popular database architectures under provenance recording/querying workloads. The database architectures that we considered are: RDBMS, Schemaless Embedded Databases (Berkeley DB), XML, and LDAP. Finally, we present preliminary performance results for the database architecture for provenance recording and some common provenance queries. Our results indicate that schemaless embedded databases have the best performance under most provenance workloads. The results also indicate that RDBMS has the best space utilization under most provenance workloads.Engineering and Applied Science

Harvard University - DASH